AITopics | subtask 2

Collaborating Authors

subtask 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explaining novel senses using definition generation with open language models

Fedorova, Mariia, Kutuzov, Andrey, Periti, Francesco, Scherrer, Yves

arXiv.org Artificial IntelligenceOct-2-2025

We apply definition generators based on open-weights large language models to the task of creating explanations of novel senses, taking target word usages as an input. To this end, we employ the datasets from the AXOLOTL'24 shared task on explainable semantic change modeling, which features Finnish, Russian and German languages. We fine-tune and provide publicly the open-source models performing higher than the best submissions of the aforementioned shared task, which employed closed proprietary LLMs. In addition, we find that encoder-decoder definition generators perform on par with their decoder-only counterparts.

axolotl, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.26181

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(8 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection

Macko, Dominik

arXiv.org Artificial IntelligenceSep-25-2025

The large language models (LLMs) are able to generate high-quality texts in multiple languages. Such texts are often not recognizable by humans as generated, and therefore present a potential of LLMs for misuse (e.g., plagiarism, spams, disinformation spreading). An automated detection is able to assist humans to indicate the machine-generated texts; however, its robustness to out-of-distribution data is still challenging. This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification. It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025, providing remarkable performance (1st rank) in both, the binary detection as well as the multiclass classification of various cases of human-AI collaboration.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.01702

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(8 more...)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training

Thelen, Christian Rene, Blaneck, Patrick Gustav, Bornheim, Tobias, Grieger, Niklas, Bialonski, Stephan

arXiv.org Artificial IntelligenceSep-17-2025

Positive, supportive online communication in social media (candy speech) has the potential to foster civility, yet automated detection of such language remains underexplored, limiting systematic analysis of its impact. We investigate how candy speech can be reliably detected in a 46k-comment German YouTube corpus by monolingual and multilingual language models, including GBERT, Qwen3 Embedding, and XLM-RoBERTa. We find that a multilingual XLM-RoBERTa-Large model trained to detect candy speech at the span level outperforms other approaches, ranking first in both binary positive F1: 0.8906) and categorized span-based detection (strict F1: 0.6307) subtasks at the GermEval 2025 Shared Task on Candy Speech Detection. We speculate that span-based training, multilingual capabilities, and emoji-aware tokenizers improved detection performance. Our results demonstrate the effectiveness of multilingual models in identifying positive, supportive language.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.07459

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(8 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Social Media (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

AraHealthQA 2025: The First Shared Task on Arabic Health Question Answering

Alhuzali, Hassan, Al-Eisawi, Walid, Abdul-Mageed, Muhammad, Abouzahir, Chaimae, Abu-Daoud, Mouath, Alasmari, Ashwag, Al-Monef, Renad, Alqahtani, Ali, Ayash, Lama, Kharouf, Leen, Shamout, Farah E., Habash, Nizar

arXiv.org Artificial IntelligenceSep-16-2025

We introduce AraHealthQA 2025, the Comprehensive Arabic Health Question Answering Shared Task, held in conjunction with ArabicNLP 2025 (co-located with EMNLP 2025). This shared task addresses the paucity of high-quality Arabic medical QA resources by offering two complementary tracks: MentalQA, focusing on Arabic mental health Q&A (e.g., anxiety, depression, stigma reduction), and MedArabiQ, covering broader medical domains such as internal medicine, pediatrics, and clinical decision making. Each track comprises multiple subtasks, evaluation datasets, and standardized metrics, facilitating fair benchmarking. The task was structured to promote modeling under realistic, multilingual, and culturally nuanced healthcare contexts. We outline the dataset creation, task design and evaluation framework, participation statistics, baseline systems, and summarize the overall outcomes. We conclude with reflections on the performance trends observed and prospects for future iterations in Arabic health QA.

large language model, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2508.20047

Country:

Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Europe > United Kingdom > Wales (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A Multi-Strategy Approach for AI-Generated Text Detection

Zain, Ali, Farooqui, Sareem, Rafi, Muhammad

arXiv.org Artificial IntelligenceSep-3-2025

This paper presents presents three distinct systems developed for the M-DAIGT shared task on detecting AI generated content in news articles and academic abstracts. The systems includes: (1) A fine-tuned RoBERTa-base classifier, (2) A classical TF-IDF + Support Vector Machine (SVM) classifier , and (3) An Innovative ensemble model named Candace, leveraging probabilistic features extracted from multiple Llama-3.2 models processed by a customTransformer encoder.The RoBERTa-based system emerged as the most performant, achieving near-perfect results on both development and test sets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.00623

Country:

North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Singapore (0.04)
Asia > Pakistan > Sindh > Karachi Division > Karachi (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)

Add feedback

CareLab at #SMM4H-HeaRD 2025: Insomnia Detection and Food Safety Event Extraction with Domain-Aware Transformers

Liang, Zihan, Pan, Ziwen, Dey, Sumon Kanti, Ismail, Azra

arXiv.org Artificial IntelligenceJun-24-2025

This paper presents our system for the SMM4H-HeaRD 2025 shared tasks, specifically Task 4 (Subtasks 1, 2a, and 2b) and Task 5 (Subtasks 1 and 2). Task 4 focused on detecting mentions of insomnia in clinical notes, while Task 5 addressed the extraction of food safety events from news articles. We participated in all subtasks and report key findings across them, with particular emphasis on Task 5 Subtask 1, where our system achieved strong performance--securing first place with an F1 score of 0.958 on the test set. To attain this result, we employed encoder-based models (e.g., RoBERTa), alongside GPT -4 for data augmentation. This paper outlines our approach, including preprocessing, model architecture, and subtask-specific adaptations.

classification, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.18185

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation

Zhao, Henry Hengyuan, Gao, Difei, Shou, Mike Zheng

arXiv.org Artificial IntelligenceFeb-11-2025

Current GUI agents have achieved outstanding performance in GUI element grounding. However, planning remains highly challenging, especially due to sensitivity to the initial state of the environment. Specifically, slight differences in the initial state-such as the target software not being open or the interface not being in its default state-often lead to planning errors. This issue is widespread in real user scenarios, but existing benchmarks fail to evaluate it. In this paper, we present WorldGUI, a novel GUI benchmark that designs GUI tasks with various initial states to simulate real computer-user interactions. The benchmark spans a wide range of tasks across 10 popular software applications, including PowerPoint, VSCode, and Adobe Acrobat. In addition, to address the challenges of dynamic GUI automation tasks, we propose GUI-Thinker, a holistic framework, leveraging a critique mechanism, that effectively manages the unpredictability and complexity of GUI interactions. Experimental results demonstrate that GUI-Thinker significantly outperforms Claude-3.5 (Computer Use) by 14.9% in success rate on WorldGUI tasks. This improvement underscores the effectiveness of our critical-thinking-based framework in enhancing GUI automation.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.08047

Country: Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Technology (0.31)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Online Social Support Detection in Spanish Social Media Texts

Tash, Moein Shahiki, Ramos, Luis, Ahani, Zahra, Monroy, Raul, kolesnikova, Olga, Calvo, Hiram, Sidorov, Grigori

arXiv.org Artificial IntelligenceFeb-9-2025

The advent of social media has transformed communication, enabling individuals to share their experiences, seek support, and participate in diverse discussions. While extensive research has focused on identifying harmful content like hate speech, the recognition and promotion of positive and supportive interactions remain largely unexplored. This study proposes an innovative approach to detecting online social support in Spanish-language social media texts. We introduce the first annotated dataset specifically created for this task, comprising 3,189 YouTube comments classified as supportive or non-supportive. To address data imbalance, we employed GPT-4o to generate paraphrased comments and create a balanced dataset. We then evaluated social support classification using traditional machine learning models, deep learning architectures, and transformer-based models, including GPT-4o, but only on the unbalanced dataset. Subsequently, we utilized a transformer model to compare the performance between the balanced and unbalanced datasets. Our findings indicate that the balanced dataset yielded improved results for Task 2 (Individual and Group) and Task 3 (Nation, Other, LGBTQ, Black Community, Women, Religion), whereas GPT-4o performed best for Task 1 (Social Support and Non-Support). This study highlights the significance of fostering a supportive online environment and lays the groundwork for future research in automated social support detection.

accuracy, f1-score, weighted f1-score, (16 more...)

arXiv.org Artificial Intelligence

2502.0964

Country:

South America (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > Central America (0.04)
(2 more...)

Genre: Research Report > New Finding (0.89)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Law > Civil Rights & Constitutional Law (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

JuniperLiu at CoMeDi Shared Task: Models as Annotators in Lexical Semantics Disagreements

Liu, Zhu, Hu, Zhen, Liu, Ying

arXiv.org Artificial IntelligenceDec-30-2024

We present the results of our system for the CoMeDi Shared Task, which predicts majority votes (Subtask 1) and annotator disagreements (Subtask 2). Our approach combines model ensemble strategies with MLP-based and threshold-based methods trained on pretrained language models. Treating individual models as virtual annotators, we simulate the annotation process by designing aggregation measures that incorporate continuous relatedness scores and discrete classification labels to capture both majority and disagreement. Additionally, we employ anisotropy removal techniques to enhance performance. Experimental results demonstrate the effectiveness of our methods, particularly for Subtask 2. Notably, we find that standard deviation on continuous relatedness scores among different model manipulations correlates with human disagreement annotations compared to metrics on aggregated discrete labels. The code will be published at https://github.com/RyanLiut/CoMeDi_Solution.

computational linguistic, disagreement, subtask 2, (13 more...)

arXiv.org Artificial Intelligence

2411.12147

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
(13 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

A Simple Yet Effective Corpus Construction Framework for Indonesian Grammatical Error Correction

Lin, Nankai, Zeng, Meiyu, Huang, Wentao, Jiang, Shengyi, Xiao, Lixian, Yang, Aimin

arXiv.org Artificial IntelligenceOct-28-2024

Currently, the majority of research in grammatical error correction (GEC) is concentrated on universal languages, such as English and Chinese. Many low-resource languages lack accessible evaluation corpora. How to efficiently construct high-quality evaluation corpora for GEC in low-resource languages has become a significant challenge. To fill these gaps, in this paper, we present a framework for constructing GEC corpora. Specifically, we focus on Indonesian as our research language and construct an evaluation corpus for Indonesian GEC using the proposed framework, addressing the limitations of existing evaluation corpora in Indonesian. Furthermore, we investigate the feasibility of utilizing existing large language models (LLMs), such as GPT-3.5-Turbo and GPT-4, to streamline corpus annotation efforts in GEC tasks. The results demonstrate significant potential for enhancing the performance of LLMs in low-resource language settings. Our code and corpus can be obtained from https://github.com/GKLMIP/GEC-Construction-Framework.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.20838

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Guangzhou (0.05)
North America > United States > Maryland > Baltimore (0.04)
(6 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback